Goto

Collaborating Authors

 maximum score


Block Sparse Flash Attention

arXiv.org Artificial Intelligence

Modern large language models increasingly require long contexts for reasoning and multi-document tasks, but attention's quadratic complexity creates a severe computational bottleneck. We present Block-Sparse FlashAttention (BSFA), a drop-in replacement that accelerates long-context inference while preserving model quality. Unlike methods that predict importance before computing scores, BSFA computes exact query-key similarities to select the top-k most important value blocks for each query. By comparing per-block maximum scores against calibrated thresholds, we skip approximately 50% of the computation and memory transfers for pruned blocks. Our training-free approach requires only a one-time threshold calibration on a small dataset to learn the per-layer and per-head attention score distributions. We provide a CUDA kernel implementation that can be used as a drop-in replacement for FlashAttention. On Llama-3.1-8B, BSFA achieves up to 1.10x speedup on real-world reasoning benchmarks and up to 1.24x for needle-in-a-haystack retrieval tasks while maintaining above 99% baseline accuracy, with certain configurations even improving accuracy by focusing on the most relevant content, substantially outperforming existing sparse attention methods. The implementation is available at https://github.com/Danielohayon/Block-Sparse-Flash-Attention


ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters

arXiv.org Artificial Intelligence

The self-attention mechanism sets transformer-based large language model (LLM) apart from the convolutional and recurrent neural networks. Despite the performance improvement, achieving real-time LLM inference on silicon is challenging due to the extensively used Softmax in self-attention. Apart from the non-linearity, the low arithmetic intensity greatly reduces the processing parallelism, which becomes the bottleneck especially when dealing with a longer context. To address this challenge, we propose Constant Softmax (ConSmax), a software-hardware co-design as an efficient Softmax alternative. ConSmax employs differentiable normalization parameters to remove the maximum searching and denominator summation in Softmax. It allows for massive parallelization while performing the critical tasks of Softmax. In addition, a scalable ConSmax hardware utilizing a bitwidth-split look-up table (LUT) can produce lossless non-linear operation and support mix-precision computing. It further facilitates efficient LLM inference. Experimental results show that ConSmax achieves a minuscule power consumption of 0.43 mW and area of 0.001 mm2 at 1-GHz working frequency and 22-nm CMOS technology. Compared to state-of-the-art Softmax hardware, ConSmax results in 14.5x energy and 14.0x area savings with a comparable accuracy on a GPT-2 model and the WikiText103 dataset.


Language Decision Transformers with Exponential Tilt for Interactive Text Environments

arXiv.org Artificial Intelligence

Text-based game environments are challenging because agents must deal with long sequences of text, execute compositional actions using text and learn from sparse rewards. We address these challenges by proposing Language Decision Transformers (LDTs), a framework that is based on transformer language models and decision transformers (DTs). Our LDTs extend DTs with 3 components: (1) exponential tilt to guide the agent towards high obtainable goals, (2) novel goal conditioning methods yielding better results than the traditional return-to-go (sum of all future rewards), and (3) a model of future observations that improves agent performance. LDTs are the first to address offline RL with DTs on these challenging games. Our experiments show that LDTs achieve the highest scores among many different types of agents on some of the most challenging Jericho games, such as Enchanter.


A Hybrid Evolutionary Approach to Solve University Course Allocation Problem

arXiv.org Artificial Intelligence

This paper discusses various types of constraints, difficulties and solutions to overcome the challenges regarding university course allocation problem. A hybrid evolutionary algorithm has been defined combining Local Repair Algorithm and Modified Genetic Algorithm to generate the best course assignment. After analyzing the collected dataset, all the necessary constraints were formulated. These constraints manage to cover the aspects needed to be kept in mind while preparing clash free and efficient class schedules for every faculty member. The goal is to generate an optimized solution which will fulfill those constraints while maintaining time efficiency and also reduce the workload of handling this task manually. The proposed algorithm was compared with some base level optimization algorithms to show the better efficiency in terms of accuracy and time.


Data Leakage

#artificialintelligence

If the process of standardizing numeric data is prone to leakage, then why can't it be skipped? Equal Feature Importance -- Let's say we have two features: final_exam_score and SAT_score [USA college prep test]. On one hand, the final exam has a maximum score of 100, but, on the other hand, the SAT has a maximum score of 1600. If we don't normalize these two features based on their range of possible values, then an algorithm would initially be prone to prioritizing the SAT_score feature because of its larger values. However, if we normalize both features between 0 and 1, then they will be treated equally at the start of training. Help Prevent Gradient Explosion -- Neural networks learn better when input values are close to zero.


Introduction to Reinforcement Learning

#artificialintelligence

The idea of CartPole is that there is a pole standing up on top of a cart. The goal is to balance this pole by moving the cart from side to side to keep the stick balanced upright. We consider the environment won if we balance it for 500 frames and fail once the pole is tilted more than 15 degrees from totally vertical or the cart moves more than 2.4 units from the middle position. For every frame that we go with the pole "balanced" (less than 15 degrees from vertical), our "score" gets 1, and our target is a score of 500. Now, however, how can we do this?


Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents

arXiv.org Artificial Intelligence

Text-based games simulate worlds and interact with players using natural language. Recent work has used them as a testbed for autonomous language-understanding agents, with the motivation being that understanding the meanings of words or semantics is a key component of how humans understand, reason, and act in these worlds. However, it remains unclear to what extent artificial agents utilize semantic understanding of the text. To this end, we perform experiments to systematically reduce the amount of semantic information available to a learning agent. Surprisingly, we find that an agent is capable of achieving high scores even in the complete absence of language semantics, indicating that the currently popular experimental setup and models may be poorly designed to understand and leverage game texts. To remedy this deficiency, we propose an inverse dynamics decoder to regularize the representation space and encourage exploration, which shows improved performance on several games including Zork I. We discuss the implications of our findings for designing future agents with stronger semantic understanding.


Let's Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments

arXiv.org Artificial Intelligence

Reproducibility in reinforcement learning is challenging: uncontrolled stochasticity from many sources, such as the learning algorithm, the learned policy, and the environment itself have led researchers to report the performance of learned agents using aggregate metrics of performance over multiple random seeds for a single environment. Unfortunately, there are still pernicious sources of variability in reinforcement learning agents that make reporting common summary statistics an unsound metric for performance. Our experiments demonstrate the variability of common agents used in the popular OpenAI Baselines repository. We make the case for reporting post-training agent performance as a distribution, rather than a point estimate.


AI is trained to create incredibly British place names

Daily Mail - Science & tech

From Pratt's Bottom to Giggleswick, Britain is well-known for its unusual place names. But there is now an artificial intelligence programme that can generate its own. Oregon-based programmer Dan Hon, who created the programme for fun, posted a new list of British place names created by the AI on Twitter. The programme analysed thousands of British names of towns and villages and was then trained to make new ones at random. It created almost 4,500 'incredibly British' place names in total, and from'Filton-on's Forton' to'Grinachard St Ringley', many sound just like real places.


Microsoft AI gets maximum score possible on Ms. Pac-Man

#artificialintelligence

Humans are now second-best at playing Ms. Pac-Man, a 1980s twist on the arcade classic, involving eating pellets and being chased by ghosts. It was rated as one of the hardest games for an AI to beat, but that didn't stop one. An AI from Microsoft's Maluuba team -- a Canadian deep learning startup the company acquired earlier this year -- has now scored the maximum score possible of 999,990 in the Atari game, beating the human record by four times. This was achieved using a method of reinforcement learning called Hybrid Reward Architecture. The team taught 150 AI agents to work together in parallel to master the game.